(Partially abridged from R geodata workshop 2019 & USC POIR Workshop on Maps in R, 2017)

Plotting on world maps

Coordinate systems

We usually work with two-dimensional vector data, i.e., points located within a coordinate reference system (CRS).

For example, the point (-69.94°, 42.6°) is represented in the WGS84 CRS (the world geodetic system used by GPS).

Points describe location on a sphere (well, actually WGS84 models Earth as an ellipsoid) and its components are given in degrees:

  • -69.94° is the longitude (positive/negative: East/West of IERS reference meridian);
  • 42.6° is the latitude (positive/negative: North/South of the Equator).

Points can be connected to form other geometric shapes such as lines or polygons. Points and shapes can represent many geographic entities: cities, buildings, countries, rivers, lakes, etc.

Packages

We need third-party packages to extend R in order to work with geographical data (or geodata). Following are some examples:

  • maps - contains geo-data for national borders and administrative regions for several countries
  • rgeos, maptools - Misc. functions for operations on geometries
  • sf (“Simple Features for R”): reading, writing and transforming spatial data
  • rnaturalearth - allows interaction with Natural Earth map data (more on this later).
install.packages(c("maps", "sf"))

Basic maps

As a first example, we will create a simple map of the continental United States. We get the data from the maps package and plot it using ggplot2.

states_map <- map_data("state")

Let us look at the structure of the data we retrieved from maps. The data is stored as a data frame and contains observations that are characterized by unique combinations of longitude and latitude values. Each observation has the following attributes: group, order, region, and subregion if applicable.

head(states_map)
##        long      lat group order  region subregion
## 1 -87.46201 30.38968     1     1 alabama      <NA>
## 2 -87.48493 30.37249     1     2 alabama      <NA>
## 3 -87.52503 30.37249     1     3 alabama      <NA>
## 4 -87.53076 30.33239     1     4 alabama      <NA>
## 5 -87.57087 30.32665     1     5 alabama      <NA>
## 6 -87.58806 30.32665     1     6 alabama      <NA>

The longitude and latitude information denotes points along the borders of countries and geographical regions. We can represent them as points in the x-y-coordinate system, plotting the longitude along the x-axis and latitude along the y-axis.

ggplot(states_map, aes(x = long, y = lat)) + geom_point()

Not really what we are aiming for, but it’s a start.

The group and order variables in the data set code relational information of the points. For example, there are 49 regions (all states minus Alaska and Hawaii, plus District of Columbia), 63 groups, and a varying number of points (observations) within each group. The observations denote the border points we plotted previously, and the order counter establishes the sequence in which they should be plotted.

head(table(states_map$region))
## 
##     alabama     arizona    arkansas  california    colorado connecticut 
##         202         149         312         516          79          91

We can use a geom_polygon() layer to plot the observations as polygons, rather than points. In order to do that, we need to specify the grouping parameter. If the data wasn’t ordered correctly, we could use the dplyr::arrange() function to establish the correct sequencing of points for each polygon.

ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon()

The map appears to be a bit “squished”. This is because the scaling of the x-axis and y-axis is not based on the scaling of longitude and latitude. We can pass special mapping parameters to ggplot2 via the coord_map() command to achieve the correct aspect ratio or use different map projections. Note that the mapproj library is required to use this feature.

ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon() +
    theme_minimal() + coord_map()  # default is Mercator projection

ggplot(states_map, aes(x = long, y = lat, group = group)) + geom_polygon() +
    theme_minimal() + coord_map("mollweide")

So far we explored a simple way to read and plot maps. As we will see in the next sections, more advanced and flexible methods are available.

Simple Features

Most packages for working with geo-data in R rely on the Simple Features standard, which describes how objects in the real world can be represented in computers in terms of their spatial geometry. A feature has a geometry describing where the feature is located on Earth, plus a set of attributes describing other properties.

For example:

  • geometry: point at -73.94°, 40.6°
  • name: New York City
  • population: 8.6 M

Or also:

  • geometry: polygons with points at different coordinates
  • name: Italy
  • population: 60 M
  • most popular meal: pizza

A spatial dataset contains geodata in a geometry column and further attributes, like the ones we just saw.

World map

Now, we make a world map. First, we load the required packages:

library(ggplot2)
library(maps)
library(sf)

We can use map() to load the world data; then we convert it to a “simple features” object with st_as_sf():

worldmap_data <- st_as_sf(map("world", plot = FALSE, fill = TRUE))
head(worldmap_data, n = 3)  # to have a look inside
## Simple feature collection with 3 features and 1 field
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -70.06612 ymin: -18.01973 xmax: 74.89131 ymax: 38.4564
## Geodetic CRS:  WGS 84
##            ID                           geom
## 1       Aruba MULTIPOLYGON (((-69.89912 1...
## 2 Afghanistan MULTIPOLYGON (((74.89131 37...
## 3      Angola MULTIPOLYGON (((23.9665 -10...

We recognize this as a spatial dataset with a header giving some general info (geometry type, dimension, etc.), and two columns: ID (attribute) and geom (spatial data).

What are the numbers in the geom column?

We make a basic plot of the world map by adding a geom_sf layer to a ggplot() call:

ggplot() + geom_sf(data = worldmap_data)

Adding points

Suppose you have a separate dataframe with the coordinates of a few landmarks you want to add to the map.

## # A tibble: 3 × 3
##   name      long   lat
##   <chr>    <dbl> <dbl>
## 1 Berlin    13.4  52.5
## 2 New York -73.9  40.6
## 3 Sydney   151.  -33.9

We can add these by simply adding a geom_point layer:

ggplot(some_cities) + geom_sf(data = worldmap_data) + geom_point(aes(x = long,
    y = lat), color = "red")

Of course we can also add labels with geom_label (which draws a box around the text) or geom_text (which does not). While we are at it, we also get rid of the quite uninformative axis text and labels:

ggplot(some_cities) + geom_sf(data = worldmap_data) + geom_point(aes(x = long,
    y = lat), color = "red") + geom_label(aes(x = long, y = lat,
    label = name), hjust = 0, vjust = 1, nudge_x = 3) + theme(axis.title = element_blank(),
    axis.text = element_blank())

Another improvement would be to turn off the plot grid.

Country names vs. Country codes

Different names may refer to the same country (e.g. Germany / Federal Republic of Germany / BRD; Italy / Italian Republic / Republic of Italy), therefore they are often not a good identifier.

ISO 3166-1 designates every country a two- or three-letter code (e.g. DE / DEU; IT / ITA) and it is often used in datasets.

Let’s create a map of Europe highlighting a random sample of EU countries together with their ISO codes: we’ll use Natural Earth as map provider.

library(rnaturalearth)
library(ggrepel)

# get Europe map from Natural Earth
world <- ne_countries(type = "map_units", returnclass = "sf",
    continent = "europe")
# set coordinate system to EPSG 3035 (European LAEA
# projection) (more info: https://epsg.io/3035)
world <- st_transform(world, 3035)
world <- st_crop(world, xmin = 0, xmax = 7e+06, ymin = 0, ymax = 5e+06)
labelsdf <- data.frame(label = sprintf("%s (%s)\nISO Codes: %s / %s",
    world$name, world$formal_en, world$iso_a2, world$iso_a3),
    stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(world$geometry)))
labelsdf$x <- labels_coords$X
labelsdf$y <- labels_coords$Y

set.seed(9)
world$show_label <- sample(c(TRUE, FALSE), nrow(labelsdf), replace = TRUE,
    prob = c(0.15, 0.85)) & !is.na(world$iso_a2)
labelsdf <- labelsdf[world$show_label, ]

ggplot() + geom_sf(aes(fill = show_label), data = world) + geom_label_repel(aes(label = label,
    x = x, y = y), data = labelsdf, size = 3, min.segment.length = 0,
    alpha = 0.85) + scale_fill_brewer(guide = "none", palette = "Dark2") +
    labs(title = "Random sample of European countries w/ ISO codes",
        caption = "source: Natural Earth Data") + theme_bw() +
    theme(axis.text = element_blank(), axis.title = element_blank(),
        axis.ticks = element_blank())

NUTS in EU

The Nomenclature of Territorial Units for Statistics (NUTS: Nomenclature des Unités Territoriales Statistiques) is a geocode standard developed by Eurostat that divides the EU territory into regions at 3 different levels for socio-economic analyses of the regions.

Candidate countries and countries inside EFTA also have NUTS regions (Norway, Switzerland, Albania, Turkey, etc.).

Italy has 3 levels of NUTS:

  • 5 Groups of regions (NUTS 1) - e.g., Northwest, Northeast, Central, South, Insular
  • 21 Regions (NUTS 2)
  • 107 Provinces (NUTS 3)

Below the NUTS levels, Italy has two levels of local administrative units: level 1 corresponds to NUTS 3, and level 2 includes the 8094 Municipalities.

In the following, we plot the map of Italy at the NUTS level 2, annotating a number of randomly chosen regions.

# read NUTS level-2 info from Eurostat
nutsrg <- read_sf("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/2021/3857/20M/nutsrg_2.json")
# [optional] set coordinate system to EPSG 3857 (Web
# Mercator)
st_crs(nutsrg) <- 3857
# restrict to Italy NUTS codes
nutsrg <- nutsrg[grepl("^IT", nutsrg$id), ]

labelsdf <- data.frame(label = sprintf("%s: %s", nutsrg$id, nutsrg$na),
    stringsAsFactors = FALSE)
labels_coords <- as.data.frame(st_coordinates(st_centroid(nutsrg$geometry)))
labelsdf <- bind_cols(labelsdf, labels_coords)
nutsrg$show_label <- rep(FALSE, nrow(nutsrg))

set.seed(10)
nutsrg$show_label[sample(1:nrow(nutsrg), 4)] <- TRUE
labelsdf <- labelsdf[nutsrg$show_label, ]

ggplot() + geom_sf(aes(fill = show_label), data = nutsrg) + geom_label_repel(aes(label = label,
    x = X, y = Y), data = labelsdf, size = 3, min.segment.length = 0,
    alpha = 0.85) + scale_fill_brewer(guide = "none", palette = "Dark2") +
    labs(title = "Random sample of Italy NUTS level 2", caption = "source: Eurostat / https://github.com/eurostat/Nuts2json") +
    theme_bw() + theme(axis.text = element_blank(), axis.title = element_blank(),
    axis.ticks = element_blank())

Sources for geodata

A number of R libraries come already with geodata, or provide ways to download them programmatically. For example:

  • maps: World, USA, US states, US counties, and more
  • mapdata: World in higher resolution, China, Japan and more
  • mapIT: contains Italian maps
  • rnaturalearth: R package for interacting with natural earth vector map data
  • OpenStreetMap: Access to the OpenStreetMap API

Natural Earth Data

naturalearthdata.com: *Natural Earth is a public domain map dataset available at 1:10m, 1:50m, and 1:110m scales. Featuring tightly integrated vector and raster data, with Natural Earth you can make a variety of visually pleasing, well-crafted maps with cartography or GIS software.

Provides vector data for:

  • countries and provinces, departments, states, etc.
  • populated places (capitals, major cities and towns)
  • physical features such as lakes, rivers, etc.

You can either download the data directly from the website or (even better) use the package rnaturalearth.

Open Street Map

  • provides even more detail than Natural Earth Data: streets, pathways, bus stops, metro lines, etc.
  • GeoFabrik provides downloads of the raw data
  • much harder to work with because of the complexity of the data.

OSM Admin Boundaries Map is a web-service to download administrative boundaries worldwide for different levels in different formats (shapefile, GeoJSON, etc.). It also contains meta-data depending on the country.

File formats

There are many file formats that store data for geographic information system (GIS) software. The most popular include:

  • ESRI shapefile (.shp)
  • GeoJSON (.geojson, .json)
  • Keyhole Markup Language KML (.kml, .kmz), used by Google Earth
  • Geography Markup Language GML (.gml)

Additional formats exist, such as vector data files (.svg, .poly) or databases (.sql, .sqlite) - although they are not specific for geodata, they can be used for this purpose.

The good news is that the sf library can handle all popular file formats!

Linking data with dplyr: recap

Suppose you have two datasets: pm, a table with particulate matter measurements in some cities; and city_coords, a table with GPS coordinates of many cities.

pm dataset
city pm_mg
Amman 999
Saltillo 869
Usak 814
The Bronx 284
Seoul 129
city_coords dataset
city lng lat
Amman 31.910 35.948
Saltillo -100.994 25.434
Berlin 13.402 52.520
Usak 29.403 38.673
New York -73.940 40.600
Seoul 126.984 37.536

To combine these datasets (for example, to create a choropleth map), we can use the city column for matching. We note that not all cities in pm also appear in city_coords, and vice versa.

We recall the dplyr’s three different ways of joining tables:

  • left_join(a, b, by=column) always retains rows on the left side and fills up non-matching rows with NAs.
knitr::kable(left_join(pm, city_coords, by = "city"))
city pm_mg lng lat
Amman 999 31.910 35.948
Saltillo 869 -100.994 25.434
Usak 814 29.403 38.673
The Bronx 284 NA NA
Seoul 129 126.984 37.536
  • right_join(a, b, by=column)` always retains rows on the right side and fills up non-matching rows with NAs.
knitr::kable(right_join(pm, city_coords, by = "city"))
city pm_mg lng lat
Amman 999 31.910 35.948
Saltillo 869 -100.994 25.434
Usak 814 29.403 38.673
Seoul 129 126.984 37.536
Berlin NA 13.402 52.520
New York NA -73.940 40.600
  • right_join(a, b, by=column)` only retains rows that match on both sides.
knitr::kable(inner_join(pm, city_coords, by = "city"))
city pm_mg lng lat
Amman 999 31.910 35.948
Saltillo 869 -100.994 25.434
Usak 814 29.403 38.673
Seoul 129 126.984 37.536

Choropleth maps

In choropleth maps, regions are shaded according to the values of some variable, e.g.:

knitr::include_graphics("https://www.un.org/waterforlifedecade/images/scarcity/2013_scarcity_graph_2.png")

By the way, the above picture is a nice example of bad color scheme (why?).

We now want to create a choropleth map using data from the Berlin Senate Department for Urban Development and Housing (through FIS Broker). The data include indicators for monitoring of social development.

# get the geoJSON
bln_soc <- read_sf("https://github.com/WZBSocialScienceCenter/r-geodata-workshop/raw/master/data/bln_plr_sozind.geojson")
head(bln_soc)
## Simple feature collection with 6 features and 10 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 386668.2 ymin: 5817761 xmax: 390764.3 ymax: 5820432
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 6 × 11
##   PLANNAME        EW2016 STATUS1 STATUS2 STATUS3 STATUS4 DYNAMO1 DYNAMO2 DYNAMO3
##   <chr>           <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
## 1 Stülerstraße    3114      4.98    1.52    8.16   20.5    -0.19    0.14   -0.42
## 2 Großer Tiergar… 192      NA      NA      NA      NA      NA      NA      NA   
## 3 Lützowstraße    5518      6.83    2.08   17.7    34.3    -0.5    -0.03   -0.8 
## 4 Körnerstraße    4704      6.27    1.49   20.7    45.9    -1.33   -0.76   -1.66
## 5 Nördlicher Lan… 1135      2.06    0       1.5     3.55   -0.77   -0.74   -0.42
## 6 Wilhelmstraße   2190      3.64    0.89    8.45   27.3    -1.23   -0.59    0.13
## # … with 2 more variables: DYNAMO4 <dbl>, geometry <MULTIPOLYGON [m]>

Our aim is to visualize the spatial distribution of STATUS4, representing child poverty rate in percent (i.e., the portion of children \(<15\) y.o. living in a household that obtains social support).

head(bln_soc[c("PLANNAME", "STATUS4", "geometry")], 5)
## Simple feature collection with 5 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 386668.2 ymin: 5817761 xmax: 389894.1 ymax: 5820432
## Projected CRS: ETRS89 / UTM zone 33N
## # A tibble: 5 × 3
##   PLANNAME                 STATUS4                                      geometry
##   <chr>                      <dbl>                            <MULTIPOLYGON [m]>
## 1 Stülerstraße               20.5  (((387256.6 5818552, 387258.6 5818545, 38726…
## 2 Großer Tiergarten          NA    (((386767.5 5819393, 386767.3 5819393, 38676…
## 3 Lützowstraße               34.3  (((387952.6 5818275, 387974.2 5818265, 38799…
## 4 Körnerstraße               45.9  (((388847.1 5817875, 388881.2 5817863, 38897…
## 5 Nördlicher Landwehrkanal    3.55 (((388129.5 5819015, 388142.8 5818978, 38815…

The first attempt is to directly plot the map with fill aesthetics depending on STATUS4:

ggplot(bln_soc) + geom_sf(aes(fill = STATUS4))

But there’s a problem: STATUS4 is continuous, thus the color scale is continuous too. For choropleth maps, discrete ranges are better for discerning the colors.

We need to do some binning: we decide to set 5 discrete bins (equally spaced from 0 to 100%). In the revised plot, we also change the color palette and adjust the appearance:

# binning
bln_soc$child_pov_bins <- cut(bln_soc$STATUS4, seq(0, 100, by = 20))
# new plot
ggplot(bln_soc) + 
    geom_sf(aes(fill = child_pov_bins)) +
    scale_fill_brewer(palette = 'OrRd', na.value = "grey90",
                      guide = guide_legend(title = 'Child pov.\nranges (%)')) +
    coord_sf(datum = NA) +  # disable lines ("graticule") in the background
    theme_void()

Combining data

Most of the time, you’ll have at least two datasets: one containing the measurement(s) you want to show, and the other containing the geodata.

In the following example, we focus on data related to People at risk of poverty or social exclusion by NUTS level-2 regions (tgs00107) from Eurostat.

First, we load the data:

pov_risk <- read_csv("https://github.com/WZBSocialScienceCenter/r-geodata-workshop/raw/master/data/tgs00107_pov_risk_nuts2.csv")
pov_risk$risk_pct_bins <- cut(pov_risk$risk_pct, seq(0, 100,
    by = 10))
pov_risk_2016 <- filter(pov_risk, year == 2016)  # 2016 has fewest NAs
head(pov_risk_2016)
## # A tibble: 6 × 4
##   nuts   year risk_pct risk_pct_bins
##   <chr> <dbl>    <dbl> <fct>        
## 1 AT     2016     18   (10,20]      
## 2 AT11   2016     15.1 (10,20]      
## 3 AT12   2016     13   (10,20]      
## 4 AT13   2016     26   (20,30]      
## 5 AT21   2016     16.3 (10,20]      
## 6 AT22   2016     18.4 (10,20]

Next, we load the GeoJSON containing the NUTS level-2 regions provided by Eurostat (the same we already used in one of the above examples):

# read NUTS level-2 info from Eurostat
nutsrg <- read_sf("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/2021/3857/20M/nutsrg_2.json")
head(nutsrg)
## Simple feature collection with 6 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 886184 ymin: 6159172 xmax: 2098645 ymax: 6623262
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 3
##   id    na                                                              geometry
##   <chr> <chr>                                                 <MULTIPOLYGON [°]>
## 1 CZ05  Severovýchod    (((1729426 6582279, 1758628 6576034, 1786629 6556127, 1…
## 2 CZ06  Jihovýchod      (((1725026 6422248, 1825031 6373848, 1870634 6376971, 1…
## 3 CZ07  Střední Morava  (((2048642 6342623, 2039842 6328571, 2022641 6319594, 2…
## 4 CZ08  Moravskoslezsko (((2007840 6457767, 2040242 6433957, 2067843 6430835, 2…
## 5 DE11  Stuttgart       (((1122596 6367603, 1126196 6355503, 1125796 6340281, 1…
## 6 DE12  Karlsruhe       (((1068993 6347697, 1051393 6336768, 1020191 6330523, 1…

We notice that both datasets contain a NUTS level-2 identifier, so it is easy to join them:

pov_risk_2016_geo <- left_join(nutsrg, pov_risk_2016, by = c(id = "nuts"))
head(pov_risk_2016_geo)
## Simple feature collection with 6 features and 5 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 886184 ymin: 6159172 xmax: 2098645 ymax: 6623262
## Geodetic CRS:  WGS 84
## # A tibble: 6 × 6
##   id    na                                 geometry  year risk_pct risk_pct_bins
##   <chr> <chr>                    <MULTIPOLYGON [°]> <dbl>    <dbl> <fct>        
## 1 CZ05  Severovýchod    (((1729426 6582279, 175862…  2016     11.7 (10,20]      
## 2 CZ06  Jihovýchod      (((1725026 6422248, 182503…  2016     11.5 (10,20]      
## 3 CZ07  Střední Morava  (((2048642 6342623, 203984…  2016     13.1 (10,20]      
## 4 CZ08  Moravskoslezsko (((2007840 6457767, 204024…  2016     22.1 (20,30]      
## 5 DE11  Stuttgart       (((1122596 6367603, 112619…  2016     14.6 (10,20]      
## 6 DE12  Karlsruhe       (((1068993 6347697, 105139…  2016     16.2 (10,20]

Now we have everything we need to plot the data as a choropleth map!

ggplot(pov_risk_2016_geo) + 
    geom_sf(aes(fill = risk_pct_bins), size=0.1) +
    scale_fill_brewer(palette="OrRd", na.value="grey90", 
                      guide=guide_legend(title="Pct. of people at\nrisk of poverty")) +
    labs(title="Poverty in the EU (2016)",
         subtitle="Percentage of people at risk of poverty or social exclusion",
         caption="src: Eurostat") +
    coord_sf(datum=NA) +  # disable lines ("graticule") in the background
    theme_void()

Visualizing spatio-temporal data

The pov_risk dataset has a time dimension (column year): so it is tempting to create spatio-temporal visualizations to take advantage of this.

We have a couple of options:

  • we can create a plot with facets (facet_wrap);
  • to show a dynamic evolution along time, we can create an animation with gganimate.

Facet plot

To prepare data for plotting, we focus on 4 years only: 2010, 2012, 2014, and 2016. When dealing with spatio-temporal data, we have to consider that maps change over time! In fact, the NUTS regions are released by Eurostat for 2010, 2013, and 2016.

So there’s a little overhead here: we need to assign each year in pov_risk dataframe the corresponding map year (i.e., “data year” 2012 should be assigned the “map year” 2010 because it is the closest match in the past). Then, we would need to fetch the GeoJSON files for each “map year”.

# 0. select 2010, 2012, 2014, 2016
pov_risk_10_to_16 <- filter(pov_risk, year %in% seq(2010, 2016, by = 2))
# 1. assign each year in the poverty data the corresponding year of the map
pov_risk_10_to_16 <- mutate(pov_risk_10_to_16, 
                            map_year = case_when(year < 2013 ~ 2010L,
                                                 year == 2014 ~ 2013L,
                                                 year > 2014 ~ 2016L))

# 2. load geo-data for each available year (2010, 2013 and 2016)
map_yrs <- unique(pov_risk_10_to_16$map_year)
nutsrg_per_yr <- list()
i <- 1
for (yr in map_yrs) {
    nutsrg_per_yr[[i]] <- read_sf(glue::glue("https://raw.githubusercontent.com/eurostat/Nuts2json/master/pub/v2/{yr}/3857/20M/nutsrg_2.json"))
    nutsrg_per_yr[[i]]$map_year <- yr
    i <- i + 1
}

# generate a single data frame with each year's geo-data
nutsrg_all_yrs <- do.call(rbind, nutsrg_per_yr)

# join geo-data and poverty data also on year level
pov_risk_10_to_16_geo <- left_join(nutsrg_all_yrs, pov_risk_10_to_16, by = c('id' = 'nuts', 'map_year'))

# repeat missing geo-data for each year and set the respective year
yr10_na <- filter(pov_risk_10_to_16_geo,  map_year == 2010L & is.na(year))
yr10_na$year <- 2010
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr10_na)
yr10_na$year <- 2012
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr10_na)
yr13_na <- filter(pov_risk_10_to_16_geo,  map_year == 2013L & is.na(year))
yr13_na$year <- 2014
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr13_na)
yr16_na <- filter(pov_risk_10_to_16_geo,  map_year == 2016L & is.na(year))
yr16_na$year <- 2016
pov_risk_10_to_16_geo <- rbind(pov_risk_10_to_16_geo, yr16_na)

# drop the rows with the missing geo-data
pov_risk_10_to_16_geo <- filter(pov_risk_10_to_16_geo, !is.na(year))

# 3. make the map using facets
ggplot(pov_risk_10_to_16_geo) + 
    geom_sf(aes(fill = risk_pct_bins), size = 0.1) +
    scale_fill_brewer(palette="OrRd", na.value="grey90", 
                      guide = guide_legend(title = 'Pct. of people at\nrisk of poverty')) +
    labs(title = 'Poverty in southern EU 2016 (NUTS level-2)',
         subtitle = 'Percentage of people at risk of poverty or social exclusion',
         caption = 'src: Eurostat') +
    coord_sf(datum = NA, xlim = c(-12e5, 35e5), ylim = c(41e5, 68e5)) +
    facet_wrap(~ year, nrow = 2) +   # facets per year
    theme_bw() + theme(axis.text = element_blank(), 
                       axis.title = element_blank(),
                       axis.ticks = element_blank(),
                       legend.position = 'bottom')

Animated plot

library(gganimate)

p <- ggplot(pov_risk_10_to_16_geo) + geom_sf(aes(fill = risk_pct_bins),
    size = 0.1) + scale_fill_brewer(palette = "OrRd", na.value = "grey90",
    guide = "none") + coord_sf(datum = NA, xlim = c(-1200000,
    3500000), ylim = c(4100000, 6800000)) + theme_bw() + theme(axis.text = element_blank(),
    axis.title = element_blank(), axis.ticks = element_blank(),
    legend.position = "bottom", title = element_text(size = 7),
    legend.text = element_text(size = 5)) + transition_manual(year) +
    labs(title = "{current_frame} - Poverty in southern EU {current_frame} (NUTS level-2)",
        subtitle = "Percentage of people at risk of poverty or social exclusion",
        caption = "source: Eurostat")

animate(p, width = 700, height = 450, duration = 5, renderer = ffmpeg_renderer(format = "webm"))

Projections

While you can unroll a cylinder or a cone to a sheet without tearing or stretching, you are unable to do the same with a sphere or an ellipsoid: they are “non-developable surfaces”.

Orange peel
source: flickr.com/photos/waferboard

Map projections try to do the impossible: project points of a sphere to a plane. We already worked with WGS84 coordinates, which are locations on a sphere (ellipsoid) defined as degrees. The points from spherical/ellipsoidal coordinate system are converted to a “flat surface” cartesian coordinate system via a map projection.

All map projections distort: area, shape, direction, distance and other properties can be different than on the actual sphere.

Mercator projection:

  • very popular
  • strong distortions when far away from equator (Scandinavia, Antarctica, etc.)

Mollweide projection:

  • equal-area projection
  • shape distortions when far away from center

Goode homolosine projection:

  • equal-area projection
  • no distortion around equator
  • looks a bit like the peeled orange above

Spatial datasets usually specify the CRS, so you should be able to set or transform it (e.g., st_transform(map, 3035)). You may want to use a different projection to have less distortion in the area of your interest. Another important point that you should take into account is the compatibility between datasets: for example, you may have country shapes with a specific CRS, and geocoded points with a different CRS.

Additional Resources